Skip to content

End-to-end support for coarser-than-daily count signals#794

Open
cdc-mitzimorris wants to merge 53 commits intomainfrom
mem_789_temporal_aggregation
Open

End-to-end support for coarser-than-daily count signals#794
cdc-mitzimorris wants to merge 53 commits intomainfrom
mem_789_temporal_aggregation

Conversation

@cdc-mitzimorris
Copy link
Copy Markdown
Collaborator

@cdc-mitzimorris cdc-mitzimorris commented Apr 21, 2026

Overview

Adds support for count observations aggregated to a weekly grid while the renewal equation continues to be evaluated daily. Two design pieces:

  1. Weekly observation likelihood path. Daily predicted counts are summed to the weekly observation grid before the likelihood is scored. Aggregation lives inside the numpyro-traced graph so the noise model and the likelihood operate on the same scale.
  2. Independent parameter cadence. The temporal process for $\mathcal{R}(t)$ may be parameterized daily or weekly (or any stepwise cadence) and is broadcast to the daily axis. The renewal equation and all observation transforms run daily either way.

End-to-end flow for a weekly signal

  • $\mathcal{R}(t)$ sampled by a temporal process at the chosen cadence and broadcast to the daily model axis if needed.
  • Daily renewal equation consumes daily $\mathcal{R}(t)$ and produces the daily infection trajectory.
  • Daily predicted counts computed via ascertainment + delay convolution.
  • Daily predicted counts summed to the weekly observation grid via pyrenew.time.daily_to_weekly.
  • Likelihood scored at weekly scale against weekly observations.

Relationship to pyrenew-hew

The production pyrenew-hew model parameterizes $\mathcal{R}(t)$ weekly and aggregates daily predicted hospital admissions to a weekly grid. This PR brings the same capability into PyRenew while making parameter cadence a user choice rather than a fixed coupling:

  • Match pyrenew-hew today: wrap the inner temporal process in StepwiseTemporalProcess(step_size=7, alignment="calendar_week", week_start_dow=...).
  • Daily parameterization for the same signals: drop the wrapper and use the inner process directly.
  • Mixed cadences: weekly hospital admissions and daily ED visits in one model regardless of the Rt parameter cadence.

Observation cadence and parameter cadence are independent design choices; the builder no longer enforces a pairing rule between them.

Reviewer guide

Review bottom-up through the dependency chain. Each unit's changes are self-contained.

1. Synthetic data refresh (120 → 126 days)

  • Files: datagen_he_CA_126.py, synthetic_CA_126/*.csv, synthetic_data.py, test_datagen_he_CA_126.py, test_datasets_synthetic.py.
  • Six extra days yields 18 full weeks. No API change.
  • Verify true_parameters.json matches the generating process.

2. Observation base validators — pyrenew/observation/base.py

  • New: _validate_aggregation_params, _compute_period_offset, _validate_period_end_times.
  • Generalized _validate_shapes_match replaces _validate_obs_times_shape.
  • _validate_dow_effect now uses the shared require_shape helper.
  • Focus: offset arithmetic (period_end_dow + 1 - first_day_dow) % 7 and period-boundary alignment.

3. Latent: temporal processes — pyrenew/latent/temporal_processes.py

  • New class StepwiseTemporalProcess; step_size attr added to the TemporalProcess Protocol and to AR1 / DifferencedAR1 / RandomWalk.
  • Alignment options: "model_index" (default) starts blocks at model index 0; "calendar_week" aligns weekly blocks to a declared week_start_dow.
  • Calendar-week broadcast delegates to pyrenew.time.weekly_to_daily; coarse trajectory recorded as {name_prefix}_coarse.
  • first_day_dow threaded through the Protocol so calendar-aligned wrappers can use the model-axis day-of-week; standard processes ignore it.

4. Latent: shape contracts — pyrenew/latent/{population,subpopulation}_infections.py

  • Both sample() methods accept first_day_dow and forward it to their temporal processes.
  • Output shape of every temporal process is validated via the shared pyrenew.arrayutils.require_shape helper.

5. Count observation path — pyrenew/observation/count_observations.py

  • New ctor params aggregation_period, reporting_schedule, period_end_dow.
  • New _aggregate wraps pyrenew.time.daily_to_weekly.
  • Branching in validate_data / sample for regular vs. irregular schedules.
  • Confirm all in-tree SubpopulationCounts call sites updated; timesperiod_end_times rename consistent.

6. Measurement observations — pyrenew/observation/measurement_observations.py

  • One-line refactor to use the renamed shape validator.

7. Builder + model coherence — pyrenew/model/{pyrenew_builder,multisignal_model}.py

  • MultiSignalModel.sample() accepts first_day_dow and forwards it to the latent process.
  • _validate_coherence enforces calendar-anchor and structural coherence:
    • All observations sharing an aggregation_period > 1 must agree on period_end_dow.
    • Every temporal-process step_size must be a positive integer.
    • Calendar-week-aligned temporal processes must have week_start_dow consistent with weekly period_end_dow: period_end_dow == (week_start_dow + 6) % 7.
  • first_day_dow required at validate_data when any obs has aggregation_period > 1.
  • The previous "step_size ≤ finest aggregation_period" rule was deliberately removed — see the in-body comment.

8. Integration test — test/integration/test_population_infections_he_weekly.py

  • End-to-end MCMC: weekly admissions + daily ED visits, calendar-week-aligned weekly Rt.
  • Prior-predictive shape check and posterior recovery for I0 + baseline R(t).
  • Uses numpyro.enable_x64() + set_host_device_count(4).

9. Test fixtures + config — test/conftest.py, pyproject.toml, _typos.toml

  • 196-line conftest promotes repeated setup to fixtures, including reusable temporal-process stubs (WrongShapeTemporalProcess, ConstantTemporalProcess, InvalidStepSizeTemporalProcess).
  • Pytest integration marker added so -m "not integration" skips MCMC tests.

10. Tutorial — docs/tutorials/building_multisignal_models.qmd

  • New section explaining the three independent choices: parameter cadence, model time axis, observation cadence.
  • Worked example pairing weekly Rt (alignment="calendar_week", week_start_dow=6) with weekly observations (period_end_dow=5).

Where to focus review attention

  • Offset math in _compute_period_offset and the period-boundary check in _validate_period_end_times — these govern correctness of weekly alignment.
  • StepwiseTemporalProcess calendar-week broadcasting — weekly_to_daily is reused; sanity check with n_timepoints=17, first_day_dow=3, week_start_dow=6 produces 3 coarse samples broadcast to [c0×3, c1×7, c2×7].
  • The three coherence rules in _validate_coherence — each has pass + distinct-failure tests in test_pyrenew_builder.py.
  • CountObservation._aggregate runs inside the numpyro-traced graph — likelihood scoring is at weekly scale, not post-hoc.

Incidental fixes (not directly tied to #789)

Found while implementing aggregation; small enough that separating them would create churn.

  • _validate_index_array empty-array guard (observation/base.py) — jnp.any(indices < 0) on empty arrays returned False; now returns early on size == 0.
  • _validate_index_array / _validate_obs_dense ndim check — previously accepted non-1D arrays and relied on silent broadcasting; now rejects with a clear error.
  • AR(p) stationary-SD bound in test_ar_process_asymptotics — the bound |long_ts[-1]| < 3 * noise_sd was too tight (stationary SD strictly > innovation SD when autoregressive coefficients are non-zero) and latently flaky; replaced with closed-form stationary SD per order.
  • _validate_obs_times_shape_validate_shapes_match rename — prerequisite refactor; same shape-match logic needed for (obs, period_end_times) pairs.
  • _typos.toml: allow dows. pyproject.toml: register integration pytest marker.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyrenew/observation/count_observations.py Outdated
@cdc-mitzimorris
Copy link
Copy Markdown
Collaborator Author

cdc-mitzimorris commented Apr 23, 2026

@dylanhmorris and @damonbayer - this is ready for review.
there's a lot of changes to look at via the github interface - here's an analysis of how much of this is code changes

PR Analysis: mem_789_temporal_aggregation vs main

Excluding pyrenew/datasets/. "other" = blank lines and # comments.

Top-level totals

area doc+ doc- code+ code- other+ other-
pyrenew/ (sans datasets) 644 90 487 67 113 13
test/ 713 22 1,943 94 473 2

pyrenew/ per file (sorted by code churn)

file doc+ doc- code+ code- other+ other-
observation/count_observations.py 283 51 218 52 35 13
latent/temporal_processes.py 114 0 85 1 23 0
observation/base.py 89 12 76 10 18 0
model/multisignal_model.py 46 15 39 3 7 0
model/pyrenew_builder.py 20 5 37 0 7 0
time.py 54 3 11 0 19 0
latent/subpopulation_infections.py 10 2 10 0 0 0
latent/population_infections.py 9 2 4 0 0 0
arrayutils.py 15 0 3 0 4 0
observation/measurement_observations.py 4 0 2 1 0 0
latent/__init__.py 0 0 2 0 0 0

test/ per file (sorted by code churn)

file doc+ doc- code+ code- other+ other-
test_observation_counts.py 63 0 482 8 87 0
test_pyrenew_builder.py 67 1 312 10 45 0
test_observation_validation.py 53 9 221 30 54 2
integration/test_population_infections_he_weekly.py 165 0 223 0 70 0
integration/test_population_infections_he_weekly_rt.py 128 0 193 0 62 0
test_temporal_processes.py 22 0 153 1 33 0
conftest.py 142 0 100 0 71 0
integration/conftest.py 47 0 93 3 21 0
test_datagen_he_CA_126.py 8 6 32 24 5 0
test_population_infections.py 2 0 37 1 6 0
test_subpopulation_infections.py 2 0 38 0 4 0
test_time.py 8 0 32 0 9 0
integration/test_population_infections_he.py 1 1 7 9 1 0
test_ar_process.py 0 0 13 1 5 0
test_datasets_synthetic.py 5 5 7 7 0 0

Notes on the split

  • The code-vs-docstring split uses the Python AST of each file version to identify Module/ClassDef/FunctionDef leading string-literal bodies; every +/- line is classified by its line number in the respective revision. Triple-quoted strings that aren't docstrings (e.g., multiline literals inside function bodies) count as code.
  • Code churn is dominated by count_observations.py (the irregular/weekly aggregation landing for Handle temporal aggregation in observational data #789) and temporal_processes.py. observation/base.py and multisignal_model.py carry the dispatch plumbing for first_day_dow / obs_start_date. time.py and arrayutils.py are mostly docstring (new module-level prose around small helpers).

Comment thread docs/tutorials/building_multisignal_models.qmd Outdated
Comment thread docs/tutorials/building_multisignal_models.qmd Outdated
Comment thread docs/tutorials/building_multisignal_models.qmd Outdated
Comment thread docs/tutorials/building_multisignal_models.qmd Outdated
Comment thread docs/tutorials/building_multisignal_models.qmd
Comment thread pyrenew/latent/base.py Outdated
Comment thread pyrenew/latent/temporal_processes.py Outdated
Comment thread pyrenew/arrayutils.py
between R(t) parametrization cadence and observation aggregation.
"""

step_size: int
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given docs above about default. Consider setting to 1 in the base class and then overriding in inherited classes when appropriate? Open to the more explicit approach as well though.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damonbayer thoughts?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed we can do this either way, but there are a few reasons to keep it as is:

  • TemporalProcess is a typing.Protocol, which is meant to describe a structural shape, not to provide default values.
  • Yes, the current code has 3 lines of boilerplate but they're in the right place - reading AR1 on its own tells you its step_size is 1. design precept: explicit > implicit (PEP 20).
  • If someone wants to add another TemporalProcess, seeing that the existing subclasses declare step_size explicitly is a prompt to consider what behavoir applies; otherwise, they might overlook an inappropriate default.

more volatile trajectories; smaller values produce smoother ones.
"""

step_size: int = 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above about potential to avoid with inheritance.

Comment thread pyrenew/latent/temporal_processes.py
Comment thread pyrenew/latent/temporal_processes.py
Comment thread pyrenew/model/pyrenew_builder.py Outdated
@dylanhmorris
Copy link
Copy Markdown
Collaborator

dylanhmorris commented Apr 24, 2026

All observations sharing an aggregation_period > 1 must agree on period_end_dow.
Every temporal-process step_size must be a positive integer.
Calendar-week-aligned temporal processes must have week_start_dow consistent with weekly period_end_dow: period_end_dow == (week_start_dow + 6) % 7.

High-level comment: do we definitely need to enforce all weekly quantities sharing the same week?

I agree that in many cases a user will want this, but it's not a given. For example, you could imagine two weekly aggregate observables, one reported in MMWR epiweeks, the other in isoweeks. Similarly, while matching temporal process weeks to observation weeks makes sense to me as a default, I don't think we should enforce it.
https://github.com/CDCgov/PyRenew/pull/794/changes#r3140086415

Comment thread test/test_ar_process.py
inner
Inner ``TemporalProcess`` that generates the coarse-scale
trajectory. Must satisfy the ``TemporalProcess`` protocol.
step_size
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments here.

  1. The interdependencies here feel bad.

alignment = "calendar_week" => step_size = 7 and week != None
alignment != "calendar_week" => week = None

Is there a better way to structure this?

  1. Can / should we use "week" instead of "calendar week"?

  2. Do we really need non-None week for alignment = "calendar_week", or could we use the data to inform what the week should be?

Comment thread pyrenew/latent/base.py
Comment on lines +359 to +365
for value in vars(self).values():
if (
isinstance(value, TemporalProcess)
and getattr(value, "alignment", None) == "calendar_week"
):
return True
return False
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaner to use a generator expression?

Suggested change
for value in vars(self).values():
if (
isinstance(value, TemporalProcess)
and getattr(value, "alignment", None) == "calendar_week"
):
return True
return False
return any(
isinstance(value, TemporalProcess)
and getattr(value, "alignment", None) == "calendar_week"
for value in vars(self).values()
)

subpop_fractions
Population fractions. Defaults to ``[1.0]`` (single population).
Must be ``[1.0]`` if provided.
first_day_dow
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does the first_day occur in terms of model time? Can we be more clear about that?

Comment thread test/conftest.py
f"R(t) 90% CI coverage was {coverage:.1%}, expected >= 80%"
)

def test_ascertainment_rates_recover_order_of_magnitude(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you decide which parameters to assess and why do you use different assessments? (95% coverage vs within a factor of 5)

Comment thread test/integration/test_population_infections_he_weekly_rt.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants